Unsupervised Learning: Trade&Ahead

Marks: 60

Context

The stock market has consistently proven to be a good place to invest in and save for the future. There are a lot of compelling reasons to invest in stocks. It can help in fighting inflation, create wealth, and also provides some tax benefits. Good steady returns on investments over a long period of time can also grow a lot more than seems possible. Also, thanks to the power of compound interest, the earlier one starts investing, the larger the corpus one can have for retirement. Overall, investing in stocks can help meet life's financial aspirations.

It is important to maintain a diversified portfolio when investing in stocks in order to maximise earnings under any market condition. Having a diversified portfolio tends to yield higher returns and face lower risk by tempering potential losses when the market is down. It is often easy to get lost in a sea of financial metrics to analyze while determining the worth of a stock, and doing the same for a multitude of stocks to identify the right picks for an individual can be a tedious task. By doing a cluster analysis, one can identify stocks that exhibit similar characteristics and ones which exhibit minimum correlation. This will help investors better analyze stocks across different market segments and help protect against risks that could make the portfolio vulnerable to losses.

Objective

Trade&Ahead is a financial consultancy firm who provide their customers with personalized investment strategies. They have hired you as a Data Scientist and provided you with data comprising stock price and some financial indicators for a few companies listed under the New York Stock Exchange. They have assigned you the tasks of analyzing the data, grouping the stocks based on the attributes provided, and sharing insights about the characteristics of each group.

Data Dictionary

Importing necessary libraries and data

Observation: There are 340 rows and 15 columns

Observation: There are no missing values

Observation : There are no duplicate values

Observation : There are 7 float, 4 int and 3 object data types.

Observation :

Observations:

Observation :

As mentioned above

Observation:

Exploratory Data Analysis (EDA)

Observation:

current price : The distribution shows it is heavily skewed to the right. most of the price is below $200. However there are very few above $ 200. There is one above $1200 and is an Outlier.

price change: The distribution of stock price shows almost close to normal distribution.

Volatility: The distribution shows it is skewed to the right and some show very high volatility and outliers.

ROE: The distribution shows it is heavily skewed to the right. most of the points are under 80.There are possible outliers.

Cash Ratio: The distribution shows it is skewed to the right and most of the points show the cash ratio of more than 200.

Net Cash Flow: The distribution shows normal normal distribution with possible outliers.

Net Income: The distribution shows normal normal distribution with possible outliers.

Earnings Per Share: The distribution shows normal normal distribution with possible outliers.

Estimated Shares Outstanding: The distribution shows it is heavily skewed to the right and some of the points show that few compnaies have billion outstanding shares.

P/E Ratio: The distribution shows it is heavily skewed to the right and some of the points show few companies have a P/E ration greater than 100.

P/B Ratio: The distribution shows normal normal distribution with possible outliers.

Labelled Bar plots

GICS sector

Observation:

GICS sub Industry

Observation:

Bi Variate Analysis

Observation:

Observation:

Answering the below Questions

Questions:

1. What does the distribution of stock prices look like?

Observation:

2. The stocks of which economic sector have seen the maximum price increase on average?

Observation:

3. How are the different variables correlated with each other?

Observation:

4. Cash ratio provides a measure of a company's ability to cover its short-term obligations using only cash and cash equivalents. How does the average cash ratio vary across economic sectors?

Observation:

5. P/E ratios can help determine the relative value of a company's shares as they signify the amount of money an investor is willing to invest in a single share of a company per dollar of its earnings. How does the P/E ratio vary, on average, across economic sectors?

Observation :

Summary of the EDA

Data Description:

Data that requires Preprocessing:

Data Preprocessing

No missing and Duplicate values.

Outlier check

Observation:

Scaling

Observation :

EDA post Data Pre processing

Observation:

lets look at the histogram box plots

Observation:

Observation:

K-means Clustering

Observation:

Let's check the silhouette scores.

Observation:

Observation:

K - Means Model

Cluster Profiles

K means Cluster model Box plots

Observation:

Cluster 0

Cluster 2

Cluster 3

Hierarchical Clustering

Observation:

Observation:

Dendograms for allthe linkage methods

Observation :

Observation:

Hierarchical clustering Model

Model with Average linkage

Cluster Profiling

Observation:

Model with complete Linkage

Cluster Profiling

Observation:

Model with single linkage

Cluster Profiling

Observation:

Model with Ward linkage

Cluster Profiling

Observation:

Considering the distinct Dendogram and clusters, Ward linkage as the best Model.

Lets look at the clusters and the company names in the clusters.

Hierarchical Model Box plots

Observation:

cluster 0 :

cluster 1

cluster 2

cluster 3

Dimensionality Reduction using PCA for visualization

Let's use PCA to reduce the data to two dimensions and visualize it to see how well-separated the clusters are.

Observation:

Observation :

Comaparison K means vs Hierarchial Clustering methods.

Insights

Recommendations

**Group 0:
Has very high Volataltiy a higher ROE is desirable which means the company is efficiently using its shareholder's equity to generate income. It also has Higher P/E Ratio an indicator that investors are expecting higher earnings growth in the future compared to companies with a lower P/E. However, attributes such as Net Cash Flow , Net Income and Earnings Per Share is negative according to me is a not a desired group to invest.

**Group 1: This group has high current price of the stock, high change in the price and higher P/B ratio meaning more expensive is the stock. However, it shows a higher Earnings Per Share which indicates that these stocks will yeild good returns

**Group 2: This is very moderate when it comes to current price, volatility and also moderate Earnings Per Share. P/E Ratio is makes it group to invest in and has not much of risk.

**Group 3: This group has most the attributes in moderate compared to the rest of the groups making this group less riskier and a good group to invest.

Conclusion

Thank you